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The Zwanzig-Mori projection formalism is widely used in studying systems with many degrees 
of freedom. Recently Xing and Kim used the projection formalism and derived the generalized 
Langevin equations (GLEs) for a general stochastic system not necessarily obeying detailed balance. 
In this study we develop a numerical procedure to reconstruct the GLEs from data. Numerical tests 
on two biological networks show remarkable agreement between the results calculated from the 
reconstructed GLEs and those of full model simulations. We suggest that the procedure can be 
applied in model reduction and a novel way of nonlinear time series analysis. 
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I. INTRODUCTION 



It is common to study dynamics of a system with a 
large number of degrees of freedom in almost every sci- 
entific field. In general it is impractical, and often unnec- 
essary, to track all the dynamic information of the whole 
system. Furthermore even if the dynamic information of 
the whole system is available, a set of equations of mo- 
tion or representations at reduced dimension are often 
desirable to reveal the system dynamics more transpar- 
ently and informatively, or to allow efficient description 
of long time dynamics. Consider, for example, a protein 
may have tens of thousands of atoms interacting with 
an even larger number of solvent molecules and other 
molecules. However, to provide insights of the protein 
functional mechanism it is often needed to monitor a few 
number of collective modes. 

In statistical physics, the celebrated Zwanzig-Mori pro- 
jection approach is a general and formal procedure to 
derive the equations of motion of a set of selected pri- 
mary degrees of freedom, while the remaining secondary 
degrees of freedom manifest their effects on the primary 
degrees implicitly. This basic idea can be dated back to 
Einstein on his treatment on Brownian motion [l|. In- 
spired by techniques from quantum mechanics, Zwanzig 
provided a formal procedure leading to a set of governing 
equations in the form of generalized Langevin equations 
(GLEs) [2J, y] ■ I n the limit that there is clear time scale 
separation between the primary and secondary degrees of 
freedom, the GLEs can reduce into Langevin equations. 
Mori further derived a simplified version of the equations 
that is " inherently linear in the system variables" [J, Q . 
The Z-M procedure is typically adopted to studies on 
closed Hamiltonian systems that relax to thermodynamic 
equilibrium in the long time limit. One notable recent 
development is the work of Lange and Grubmuller to de- 
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rive the dynamic equations of some collective coordinates 
with the Zwanzig projection procedure [6|. 



The success of the Zwanzig-Mori projection approach 
leads to great interest to apply it to general non- 
Hamiltonian systems out of thermal dynamic equilib- 
rium, which may be dynamically highly inhomogeneous, 
and have fascinating emerging properties such as oscilla- 
tions and bifurcations. Examples include climate changes 
in one place, stock market fluctuations, and collective dy- 
namics of ~ 10 11 neurons in a brain, stochastic dynamics 
of gene regulatory networks. It is of both theoretical and 
practical significances to examine the applicability of the 
projection formalism to complex systems, if yes, what 
new features of the GLEs exist. Chorin and coworkers 
have made considerable efforts in this field [3, |8| . Using a 
time series analysis approach[9], Erban et al. showed how 
one can reconstruct a one-dimensional Fokkcr-Planck 
equation, or the equivalent Langevin equation, to reca- 
pitulate the dynamics of a gene regulatory network such 
as a toggle switch. Kawai and Komatsuzaki derived the 
GLEs in nonstationary environments [10] . 



Recently Xing and Kim applied the projection ap- 
proach to nonequilbrium non-Hamiltonian systems with 
stochastic dynamics and derived the corresponding 
GLEs. In this paper we will further present a numeri- 
cal procedure on how to reconstruct GLEs in a reduced 
representation from full model simulations. A prominent 
feature of the procedure is that all the model parameters 
are extracted directly from the data, in the same spirit 
of Erban et a/[9( . The remaining part of the paper is or- 
ganized as follows: in Sec. |TT] we summarize the theory 
of the Zwanzig-Mori projection and its generalization to 
non-Hamiltonian systems; in Sec. IHII we present the nu- 
merical approach to extract the GLE model parameters; 
Sec. II VI shows numerical tests on two model systems; and 
we conclude with discussions in Sec. W\ 



II. THEORY 



First we give a heuristic presentation on the projection 
approach, following the discussions given in [5[ with some 
modifications, and focusing on the Hamiltonian systems. 

Consider a system described by the Hamiltonian, 



%p) = E? 



+ V(x) 



(1) 



where x and p are position and conjugate momentum 
vectors. We will use mass-weighted coordinates here. 
The Liouville operator L is defined as, 
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For an arbitrary dynamic variable A, the projection op- 
erator is defined as, 



PA = ^(A^X^)- 1 ' 



':;■ 



(3) 



where {</>(x, p)} composes the basis set for the projected 
subspace. The inner product for two arbitrary variables 
A and B is defined as, 

(A,B) = <A ] B > 

JA^Bexp(-/3H)dxdp 

J exp (—(3H) dxdp 

where t means taking transpose and complex conjugate. 
Any dynamic variable within the subspace can be ex- 
pressed as a linear combination of the basis functions. 
The projected equations of an arbitrary dynamic vari- 
able A, which is defined within the projected subspace, 
are given by, 
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where 



F(t) = exp(i(l-P)i)(l 
K(t) = -(LF{t),<f>)'{M 
= (F(t),L<t>)-(cf>,cl>)- 
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The last equation leads to the generalized fluctuation- 
dissipation relation, and we have used the anti-Hermitian 
property of the Liouville operator. 

In practice the basis sets are usually chosen as portion 
of the coordinate vector x and the corresponding conju- 
gate momentum vector p. Mori derived a GLE that is 
linear to the coordinates and momentum ]$. However, in 
principle this restriction is unnecessary. One can expand 
the Hilbert space to include high order combinations of 
the coordinates and momentum. Appendix |A1 gives such 



an analytic example. Therefore it should be clear that 
while the projection is performed in the linear Hilbert 
space, the resultant GLEs can be nonlinear to the coordi- 
nate variable. Especially below let's consider the extreme 
limit of including all the possible Hilbert functions com- 
posed by the coordinate and velocity (or momentum) in 
reduced dimension. The following procedure is analogous 
to what adopted by Zwanzig 3] . 

For simplicity let's focus on projecting onto a one- 
dimensional manifold c and its conjugate momentum, 
while generalization to higher dimensions is straightfor- 
ward. Noticing that a possible choice of the basis set of 
the Hilbert space is {x, x, xx, • • • }, one can expand c and 
c as 

c = /(x) = /(0) + V x /(0) • x + 1 V xx /(0) : xx + . .(.7) 



c = V x / • x = V x /(0) • x + Vxx/(0) : xx 



(8) 



Therefore c and c are vectors in the full Hilbert space. 
Let's consider the sub- Hilbert space, which may still have 
infinite dimension, supported by all the possible multi- 
plicative combinations of c and c , such as c 2 , cc 3 .... 
A key observation is that these basis functions, denoted 
{4>(c, c)}, compose a complete basis set for the subspace 
so any arbitrary function of (c, c), can be fully expressed 
by the basis set. That is, for an arbitrary function 
g(x, p), the inner product becomes 

^2 / 9<t>i exp(-/3iJ)dxdp 

i 

= ^-r I gexp(-0H)S(f - c)S(V x f ■ p - c)dxdp 
P{c, c) J 

where 

pie, c) = J eM-PHW - c)<5(V x / • p - c)rfxdp (9) 

The above expression may be more familiar if the Dirac 
bra and ket notations are used. Then one obtains explicit 
expressions for the first term on the right hand side of 
Eqn. S 



PL-c = c 



PL-c 
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Jexp(-0H) 



(10) 



p(c, c) d c 

||V X /|| 2 <5(/ - c)5(V x / • p - c)dxdp (11) 

To derive the above expression, we perform integration 
by parts, and use the relations, 

V x <5(c - /) = V x fd f 5(x - f) = V x fd c 5(x - /) 
V x <5(c - V x / • p) = V x (V x /-p)9e<5(c-V x /-p) 
V p cJ(c-V x /-p) = V x /%J(c-V x /-p) (12) 

Compared to Eqn. 111! the result derived by Lange and 
Grubmuller has an extra term ||V X /|| 2 in the expression 
of p(c, c) [6( . The discrepancy may come from the fact 



that the projection operator defined in [6j does not rig- 
orously satisfy P 2 = P. It remains to be examined on 
how this extra term may affect the dynamics. In the case 
/ is a linear combination of x, and is chosen to satisfy 
||V X /|| 2 = 1 , Eqn. QT] gives the familiar relation to the 
gradient of potential of mean force. 

Recently Xing and Kimfll] applied the Zwanzig-Mori 
projection procedure to a general dynamic system de- 
scribed by a set of stochastic differential equations, 
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c i (x)+y; 5 «(x)^(t)i = i J 



,N 



(13) 



In general the vector G(x) can not be represented as 
the gradient of a scalar potential due to violation of de- 
tailed balance, M and N may be different, £i(t) are tem- 
porally uncorrelated, statistically independent Gaussian 
white noise with the averages satisfying (6(i)0( T )) = 
5ij6(t — t), g(x) is related to the N x N diffusion matrix 
gg T — 2D, where the transpose of a matrix is designated 
by the superscript T. The resultant GLEs for projection 
to X, components of x, assume the form 







_d_ 
"dX, 

rt 



w(x)-E r °^-w 



dsT 1<ij (s)X j (t-s)+T i (t) (14) 



n 



where W is the potential of mean force, Tq is in gen- 
eral not symmetric, reflecting violation of detailed bal- 
ance, and the generalized fluctuation-dissipation theorem 
reads, 

< FityFtf) >= (r 0ltf + T 0>ji )6(t) + r ltij (t - t') (15) 

where t > t' . For simplicity in the remaining discussions 
we focus on the case that the projected system is one- 
dimensional, while generalization to higher dimensions is 
straightforward. 



III. 



METHODS 



A. Discretization of the Generalized Langevin 
Equation 

In this work, we assume that the full dynamics of the 
system under study, or the time series data of the pri- 
mary degrees of freedom are available, and our goal is to 
reconstruct the dynamic equation in the reduced dimen- 
sion from the data. 

Consider a generic 1-D generalized Langevin equation, 







dW 
dX{t) 



T X(t) + / ri(s)X(i - s)da + F{t) 
Jo 

(16) 

we first integrate it from time iAt to (i + l)At for dis- 
crete time steps At, and perform ensemble average over 



the random force. Notice that the random force F(i) 
averages out to 0, then EqnfTB] becomes, 



JiAt OA 

rO+l)Ai 



(17) 
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(T X(t)) + (l T 1 (a)X{t-a)da) 

o 



This can further be approximated as, 



"dxiny 



At 



(X(t i+1 ) - XQki)) 



(18) 



+Y,ri{t j+1/2 )((x(t i - i +i) -x{U-j))) 

3=1 J 

Alternatively, by multiplying X(to) on both sides of 
Eqn. [TH] before performing average and integration, one 
obtains an equation for the autocorrelation function 
counterpart of Eqn. [TBI 



dW 

- ™ w 



(19) 



+ | § [(X(t )X(U +1 )) - (X(t )X(U)}} 



+ X! r i(*i+i/a) l(X(t )X(ti- j+l )) - (X(to)X(ti-i))] 

3=1 

The potential of mean force W can be obtained directly 
from the stationary distribution p ss ix) from W(X) — 
— lnp ss (X), and all the ensemble averaged terms ((•)) 
can be obtained from the data. 



B. Smoothing the Memory Kernel with Tikhonov 
Regularization 

Both Eq ns . \TE\ and [T9l are linear equations of the terms 
To and Ti(t). For example, Eqn llSI can be rewritten in 
matrix form as, 



<9 X W - AXT = 



(20) 



Here <9 X W,- 



dW 

\ ax(u) i 



elements of the square matrix 



AXij — {X(ti-j) — X (ti-j+i)} for j < i and otherwise 
and Ti = r(tj +1 / 2 )- However, it is not numerically desir- 
able to solve any of them directly, since numerical errors 
in 9 x Wj and AX accumulate quickly and lead to inac- 
curate and unstable prediction of the GLE parameters. 
Instead we regularize the data using the Tikhonov regu- 
larization (also know as Ridge regularization) [12j. Sim- 
ply speaking, the Tikhonov regularization adds a penalty 
term to damp the highly oscillating components of the es- 
timator. Mathematically, we need to minimize the func- 
tion; 



,w - Ax-ri 



IP-ri 



(21) 



In this work we choose Py = b, for i = j and = — b for 

i = j + 1 and otherwise, except for the first and the last 
row where all elements are 0. the term b is a preselected 
constant, which we will refer as the penalization factor. 
This minimization yields a simple solution of the form, 



T= (AX T -AX + P T P) AX T -d x W 



(22) 



We assume that F\ is a smooth function of t, and thus the 
penalty function P is chosen to minimize the derivatives, 
or differences between T± at two consecutive time points. 
However, use of the function P may also lose meaningful 
information and distort the result. Hence the penalty 
factor, 6, must be chosen wisely. Below we demonstrate 
how to choose b properly using two examples. 



IV. RESULTS 

A. End— product inhibition motif 

To demonstrate the strength of our parameter free 
projection method, we first apply it to a simple non- 
linear chemical network. This network is an end prod- 
uct inhibition motif found in metabolic and other biol- 
ogy networks. The reactions are governed by irreversible 
Michaelis-Mcntcn kinetics, 



Xi 



V m Xi 



K m + x A K m + x\ 



9ti(t), 



and 



C. Generating random force J-(t) 

Using Eqn.fTBIand with W, To, and Ti(t) determined, 
we simulate the GLE using the following discretized ver- 
sion, 

o - yw(x(u-i)) At _ To {x{u) x{ti _ i)} (23) 

- At J2 ri ( (k + i J At\ (X(U- k ) - X(fi-k-i)) 

dtF{t) 

V 

We use the method of Berkowitz ct al.[l3| to generate 
the random forces, with each realization given by 



*;=! 



-2— (sin/wfczAi) — sin(wfc(i — l)At)) 

Wfe 



(cos(wfczAt) 



cos(wfe(i — l)Ai)) 



where £„& and ^6fe are random numbers drawn from inde- 
pendent random gaussian distributions, Uk — 2irk/{L5t) 
and L is the number of time steps over which the 
pseudo-random forces repeat. Therefore LSt should be 
no less than the simulation time. The spectral density 
J(ui) is determined using the memory kernel through the 
Wiener-Khintchine!14] theorem, 



J(«) 



dtT(t) cos(wt) 



(25) 



4 (r + / dtTi(t) cos(ut) 



where T(t) = 2T S(t) + Ti(t). Calculation of the spec- 
tral density may need Ti(t) at time points finer that 
those retrieved in Section IIIIB| e.g., within (i — l)At 
and iAt, which are obtained through linear interpolation 
using Ti((i — l)Ai) and Ti(iAi)in this work. 



vm'Ei—l 



';()■'/ 



^m ~r %i — 1 -t^nn T" %t 



+ ^i(*)* = 2,3,4 (26) 



Values v m — 1, K m = 0.5, g = 0.005 are used in the sim- 
ulations. The system initially relaxes to the steady-state. 
We choose x\ as our species of interest. We determine the 
potential of the mean force defined as W(x\) — — \n(p ss ), 
where p ss is the stationary state distribution of x\. At 
time 0, the concentration of xi(0) is jumped to 2.0. The 
relaxation dynamics {x\{t)), which is defined as the value 
of x\ averaged over all trajectories, is recorded for ev- 
ery time step with an interval of At =0.1. In order to 
compute the memory kernel (To, Ti(i)) we first obtain 
the mean force through histogram counting and fit the 
entire W(x\) with 20 piecewise quadratic functions to fa- 
cilitate the derivative calculations. Next while recording 
the (xi(t)) we also record the (fjr-) for every time step 
determined using the functional form determined using 
the piecewise quadratic fits. 

Actually this model system has been studied in our 
earlier work [111 ], where the memory kernel is obtained 
by fitting with an ansatz of the function form. Here it 
is calculated directly using Eqn. [2U The time inde- 
pendent part of the memory kernel, Tq obtained for this 
system is 28.5, the time dependent memory kernel, Ti(f) 
is shown in Fig [TJ M, the number of time steps over 
which the pseudo-random forces repeat is chosen to be 
1000 for the GLE whereas the At = 0.1. Both agree 
well with the previous results [11], but differ in subtle 
details. As shown in Fig[T]C, this subtle difference leads 
to remarkable improvement on the agreement between 
the GLE result and the full model calculation, compared 
to the previous work. Notice that both Ti(t) and x\(t) 
show damped oscillations with similar frequency. Fur- 
thermore, with the same memory kernel as shown in 
Fig [TJ we predict the relaxation dynamics with different 
values of xi(0) = 1.3,0.87 and 0.54. Again, the results 
in Fig. [2] show striking agreements with the full model 
calculations. In general it is a question to what extent 
the GLE parameters are transferable from one situation 
to another one. The results here give a positive answer. 



B. ER-GFR Survival Signaling Switch 



V. 



DISCUSSIONS 



Next we examine a more challenging system shown 
in Fig [3]A It is a phenomenological model proposed by 
Tyson et. al. to capture the crosstalk between the growth 
factor (GF) and esptrogen receptor (ER) signaling in 
cancer studies [15|. All components in GF signaling are 
lumped into the black box named 'GFR', the growth fac- 
tor receptor. Extracellular estrogen, E2-bound esptrogen 
receptor, ER (ER:E2) inhibits GFR. Withdrawn of E2 
promotes activation of GFR, which then activates ERP 
(phosphorylated ER) and ERP:E2 (E2 bound phospho- 
rylated ER complex). ERP facilitates/stabilizes GFR 
activation. Further details of this model can be found 
in [15[ . A remarkable feature of this system is that it can 
have bistable dynamics. Appendix [B] gives the detailed 
scheme for the full model and the GLE simulations. We 
use chemical Langevin equations to simulate the stochas- 
tic dynamics. Table Q] in Appendix [B] gives the model 
parameters. 



Figure[5j3 shows that this bistable system has a double- 
well shaped potential of mean force, with the more stable 
(left) state corresponding to the low GFR activity state. 
To generate the data for reconstructing the GLE, we ini- 
tialize the system within the right well corresponding to 
the high GFR activity state. Fig. |3p gives the obtained 
memory kernel Ti (i) . It seems to reach zero at t ~ 5 x 10 4 
mins. As our first attempt, we hence set T\(t) — for 
t > 5 x 10 4 . However, the blue curve in Fig. [5p shows 
that the GLE result does not agree well with the full 
model simulation results except at an early stage and the 
long time behavior. For the latter it is because the long 
time steady state behavior is governed by the potential 
of mean force. For the former the early stage dynamics is 
governed by Tq. The discrepancy in the middle stage sug- 
gests that Ti is not sufficiently long. Physically, we think 
that the relaxation process involves the fast intrawell dy- 
namics and the slow inter- well dynamics. In this case the 
inter-well transition dynamics is beyond a simple Marko- 
vian process, but shows correlation among transitions. 
Correspondingly, the memory kernel Ti shows a rather 
long tail (Fig. |3P), and a biphasic relaxation dynamics 
for x(t) (Fig. [3f)). Indeed using the evaluated Ti(t) up 
to t = 10 8 mins, we obtain an excellent agreement be- 
tween the GLE (red curve in Fig. 03) and the full model 
simulations. 



Next, we use the reconstructed GLE to predict the first 
passage time distribution for the transition from the right 
well (high GFR) to the left well (low GFR). Fig. H shows 
excellent agreement between the GLE and full model re- 
sults. It is a highly nontrivial achievement to reproduce 
the whole distribution, not just a few moments, such as 
the mean first passage time. 



Reconstruction of the governing equations is an im- 
portant step towards understanding a dynamical system. 
The Zwanzig-Mori projection method provides a rigor- 
ous theoretical formalism. In this work, we present a 
numerical procedure to reconstruct the GLEs from the 
data. Numerical tests on two model systems show that 
the accuracy of the algorithm is encouraging. 

Eqn. [4] is a mathematically exact solution. It is equiv- 
alent to but mathematically more involved than the orig- 
inal dynamic equations. The practical usage of the pro- 
jection approach lies in the assumption that the effect 
of the implicitly treated secondary degrees of freedom 
can be well replaced by the potential of mean force, the 
memory kernel with some simple function forms, and the 
related noises with certain statistical properties. There- 
fore one expects that the GLEs after such approximation 
in the form of Eqn. Q3] work best for systems coupled 
to many degrees of freedom. Einstein shows a classical 
example that a single drag coefficient and a correspond- 
ing white noise term can well describe the dynamics of 
a Brownian particle influenced by its interaction with 
Avogadro number of solvent molecules. Then it is out of 
surprise that for both of the two models, which have only 
a small number of degrees of freedom, the GLE results 
and the full model simulations agree remarkably well. It 
may be partly due to the fact that the full system dy- 
namics is governed by stochastic differential equations, 
and the stochastic noise can be viewed as generated by 
a bath system with infinite number of degrees of free- 
dom [16j |. It remains to examine to what extent one can 
use a coordinate-independent memory kernel form to de- 
scribe a dynamic system. 

Most molecular systems are well modeled by some sim- 
ple memory kernel forms such as an exponentially de- 
cayed function. Recent single molecule studies reveal 
a power-law form for describing intramolecular fluctu- 
ations [13| • Studies on the two model systems in this 
work show that the function form can be more complex 
for systems kept out of equilibrium. Can one find a set 
of common function forms corresponding to systems with 
different dynamic behaviors? Let's ask the problem in 
an alternative way. In two or higher dimensions, the 
GLEs in Eqn. Q3] share similar form as those obtained for 
Hamiltonian systems relaxing to equilibrium, except that 
the matrix Tq is not symmetric. For a one-dimensional 
GLE, however, the form is the same. Then can one tell 
whether it describes a closed or open system from the 
equation itself? 

In this work we perform numerical simulations with the 
stochastic GLEs to demonstrate their accuracy. Simulat- 
ing GLEs with colored noise is computationally expen- 
sive, especially if the noise spectrum has a long tail. In 
practical applications, in general there is no need to per- 
form such full stochastic simulations. Instead one may 
just need to calculate the first few moments of the dis- 
tribution, or work with the corresponding Fokker-Planck 



equation. 

Our numerical studies also reveal that the quality of 
a reconstructed GLE is very sensitive to the accuracy of 
the potential of mean force, which then requires well con- 
verged sampling. Some systematic studies and statistical 
tools are needed to improve the accuracy of reconstruc- 
tion with less sufficient data. In our current algorithm, 
we first obtain the potential of mean force, and then the 
memory kernel that is affected by the form of the former. 
It may be desirable to have a procedure to reconstruct 
the two simultaneously. 

Model 2 examined in this work shows a very long 
memory kernel, which includes both intra- and inter- well 
dynamics. One possible reason is that Agfr is not a 
good choice for the reaction coordinate, and the dynam- 
ics along the orthogonal degrees of freedom is comparable 
or even slower than that along Agfr- Lange and Grub- 
muller demonstrated how to reconstruct a GLE along a 
nonlinear collective coordinate [6| . One may follow simi- 
lar procedure to examine whether a short memory kernel 
can be obtained, and whether the dynamics between the 
two states can be approximated as Markovian processes. 

In this work we focus on model reduction. Since the 
data needed for reconstruction is in the form of time se- 
ries, the procedure can be used as a way of nonlinear 
time series analysis [181 [13 ] . The latter is an active and 
under-developed area, with a main difficulty lying in how 
to construct the nonlinear function form. In the GLE 
formalism, however, the nonlinearity is given by the po- 
tential of mean force that is automatically obtained from 
data. Therefore in principle the difficulty of selecting the 
nonlinear function form does not exist. 

In conclusion, in this work we demonstrate the practi- 
cal feasibility and accuracy of using the GLEs to model 
the dynamics of a general dynamic system, i.e., with or 
without detailed balance constraint, in reduced dimen- 
sion. Further developments are necessary to test the 
formalism in different systems and make the algorithm 
practically useful on studying complex systems. 
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Appendix A: An Analytical Example 

Here we consider a system-bath Hamiltonian, 



P 2 x 2 b A 

H = — H h -x 4 

2 2 4 



?{M(.-3-)'} 



(Al) 

Zwanzig discussed a nonlinear GLE for the system coor- 
dinates {x. p} obtained by directly solving the equations 
of motion [9, ^01 , 



dx(t) 

dt 
dp(t) 

dt 



= P(t) 

= -x(t) - bx(t) 3 



dsK N {s)p(t -s)+F N (t) (A2) 



The memory kernel and the random force terms are given 
by 



K N (t) = y^ -^ cos(wjt) 

F N (t) 



(A3) 



E, , sin. Wit 
7^(0)—^ 



+I> U-(o) 

3 \ 



lj 



x(0) cosojjt (A4) 



with the fluctuation-dissipation relation, 

< F N (t)F N {t') > = k B TK N (t - t'). (A5) 

The average is over an equilibrium heat bath with the 
system constrained at {x(0),p(0)}. By projecting to the 
Hilbcrt space (x, v) with Mori's procedure, one can also 
obtain a linearized GLE [5j, 



dx(t) 

dt 
dp{t) 

dt 



= v(t) 
= -wjfc(t) 



dsK L (s)p{t-s)+F L {tjA6) 



Where uj 2 = fc^T/ < x 2 >, and the the random force and 
memory kernel terms are also related by the fluctuation- 
dissipation relation 



< F L {t)F L (t') > = k B TK L {t - t') 



(A7) 



However in this case, the average is over the uncon- 
strained thermal equilibrium distribution. Effects of the 
nonlinear term — bx(t) 3 are contained in the renormalized 
coefficent uiq, the memory kernel, and the random force 
terms. 

In the following discussions, we will generalize the 
projection procedure of Mori by choosing a basis set 



TABLE I: Parameters table for the full model simulation of the ER-GFR sig 


naling switch. 


Parameter 


Description 


Value 


kdsER:E2 


Dissociation of ER : E2 and ERP : E2 


O.OOlmm -1 


k as ER:E2 


Association of ER : El and ERP : E2 


0.001nM -1 mm _1 


kdpERP 


Dephosphorylation of ERP and ERP : E2 


0.001mm -1 


JdpERP 


Michaelis constant for dephosphorylation 


1.8nM 


kpER 


Phosphorylation of ER and ER : E2 


0.0011mm" 1 


JpER 


Michaelis constant for phosphorylation 


3nM 


k s ER 


Production rate of ER 


O.OOlnMrnin- 1 


kdER 


Degradation rate of ER 


lO^mm -1 


[E2] 


Extracellular concentration of estrogen 


0.003nM 


7 


Time-scale for GFR activation 


10 -6 mm _1 


a 


Sigmoidicity of GFR response function 


1.9 


UJO 


Basal inactivation of GFR 


-0.83 


Wl 


GFR inactivation by ER : E2 


-0.5 


LU 2 ,L0 3 ,LU4 


GFR activation by ERP : E2, ERP and GFR 


0.5,0.5,1.1 x 10~ 3 


Pl,P2,p3,Pi,P5 


Amplitudes of noise for ER, ERP, ER : E2, ERP : E2 and A G fr 


0.05,0,0,0,0.15 



{x, x 3 , v}. Functions with even powers of a; makes no con- However, one also has, 
tribution to the projection ((Lp, x 2n ) = 0, n — 1,2,...). 
Therefore the lowest nolinear basis function is x 3 . 
First, 



< Lp, x n > 



Lp 



bx 



Lx = p, Lx u 



= 3px 

x - q. 



(A8) 



< x 



n+l 



> -b < x 



n+3 



> 



J^7i < X n (j^X ~1i) > 
i 

< X 



n+1 >-b< x n+3 > (A13) 



let's calculate the normalization matrix, 



Therefore, 



A- 1 = 



< x 2 > < x 4 > < xp> 

< x 4 > < x 6 > < x 3 p > 

< xp > < x 3 p > < p 2 > 

< x 6 > /h - < x 4 > /h 
- < x 4 > jh < x 2 > /h 






< p 2 > 



(A9) 



< x 4 > 



< X" > = 



-(wg - 1) < x 2 > 
3 

1 



2 „ 2 ^ , 2 ^ 
Uln < X >< X > 



b 2 



(Wq - 1) < x 2 > 



(A14) 



(A15) 



Where h =< x 2 >< x 6 > — < x 4 > 2 . The memory 
function and the random force in the equation of motion 
of x vanish, which can be seen from, 



Lx 

One has, 
(Lp,x n ) 



(Lp, x) 



((Lx,x) (Lx,x 3 ) (Lx,p))-A l 

p (A10) 



1 



Jexp(-PH)dx 
k B T 
J exp(~f3H)dx 
knT 



x n —exp(-l3H)dx 



x n —exp(~(3H)dx 
ox 



Jexp(-(3H)dx 

nks'- 
k B T 



n.r 



n-l 



-nk B T < x™ -1 > 



2 2 

Wq < x > 



exp(—f3H)dx 

(All) 
(A12) 



Then, 



PLp(t) 



-x — bx 



(A16) 



One can easily show that the random force (through 
dF/dt = (1 — P)LF) and memory kernel (through Eqn. 
lU terms are the same as those given in Eqs. IA3I and IA4I 
, although in general here the average perform in Eqn. [S] 
is over the unconstrained thermal equilibrium distribu- 
tion. Therefore with the Mori projection procedure we 
recover Eqs. IA2I IA31 IA4I obtained by exact integration. 
Following similar procedure, one can show that further 
expanding the basis functions to include higher orders of 
x n does not change the projected equation form. The 
above results can also be obtained by applying Eqs. |H 
EHm directly. 



Appendix B: Implementing the stochastic full model 

simulation for ER-GFR signaling switch and 

reconstructing the GLE 

The rate equations for the ER-GFR signaling switch 
are taken from 15| and are listed below; 



J 



d[ER] rppij.1. [PR-F91 b rFPUFoi , k d P E R p[ERP] k pER A GFR [ER] 

T7 — — &sER — KdER[&n\ + KdsER:E2[^n ■ &Z\ - K as ER:E2[&n\[£jl\ + — 

ai JdpERP + [ERP\ J P ER + [ER\ 

d[ERP] rpppi i. [prp-p91 t [PRPHP91 k d P ERp[ERP] k pER A GFR [ER] 

-77 - -k dE R[ERP\ - k dsER . E2 [ERP . El\ - k asER , E2 [ERP\[E2\ - , , H 7 , tfpi 

at JdpERP + [nKt'l JpER + [-C/-KJ 

&^1 = -Jfe^H^ : E2] k dsE R,E-AER : S2] + k asER:E2 [ER}[E2} + ^ j'f ' ^ 
fcpBfl^Gf r[ER : £2] 



Jdp£i?p + [ERP : E2] 



JpER + [ER : E2] 



d[ERP : E2] 



dt 



+ 



k dER [ERP : E2] - k dsER .. E2 [ERP : E2] + k asER:E2 [ERP][E2] - ^^^'^^l 

JdpERP + [tLKF : h,l\ 

k pER A GFR [ER : E2] 



JpER + [ER : E2] 



dA 



GFR 



dt 



]_ 1 e -(T(oj +u!i[ER:E2]+U!2lERP:E2]+0J3lERP]+u!i[GFR])\ 



-1 



.4 



GFR 



(Bl) 



r 



where, [ER] is the concentration of unbound estro- 
gen receptor; [ERP], the concentration of unbound 
phosphorylated estrogen receptor; [ER : E2], the 
concentration of estrogen receptor bound to estrogen 
([-E2]); [ERP : E2], the concentration of phosphorylated 
estrogen receptor bound to estrogen and A GER is 
defined as the activity of growth factor receptor defined 
as logio[GFR]. Table Q] lists all the model parameters. 

Following Tyson et al, we convert the deterministic 
rate equations Eqn. IB1I to chemical Langevin equations 
as follows. Specifically we add nonzero noise terms to 
the equations of ER and Acer, 



dX, 
~~dl 



— — — \-{Si — X i )+R i (Xi, ■■■,Xi, •••,X„)+i 



(B2) 

where A^ determines the time scale of synthesis and 
degradation reaction of species Xi, Si represents the 
steady state level of species Xi , Ri indicates the rate of 

all other reactions affecting Xi, pi — y ((Si — Xi) ) eq , 
and £i is a Gaussian noise term. Only fluctuations 
arising from the processes of protein synthesis and 
degradation are added to the model; noise terms 
arising from the fast association disassociation and 
phosphorylation-dephosphorylation reactions are ig- 
nored. In our simulations, a hard reflecting barrier is 
imposed at for all chemical species (but not for Aqfr), 
so that only positive concentrations are involved. For 



Agfr — logio[GFR], a negative value corresponds to 
concentration of GFR less than 1, so there is no scope of 
negative concentration of GFR. 



For reconstruction of the GLE, we choose Aqfr as 
our species of interest. The chemical Langevin equa- 
tion, described by Eqns. IBll and IB2I with parameters 
from the Table HI is simulated for 2 x 10 6 mins with 
a time step of 1000 mins using initial concentration of 
[E2] = OnM, [ER] = 19.04nM, [ERP] = 3.15nM, 
[ER : E2] = [ERP : E2] = 0, A GFR = 0.936. Thereafter 
the stationary state distribution, p ss is obtained by sam- 
pling 1.4 x 10 7 points. The mean force potential, Fig[3j3, 
is obtained using the relation W = — ln(p ss ). 

To study the dynamics, A GFR is dragged out of 
stationary state and set to 1.3, while the concentration 
of the rest of the species are sampled from the original 
stationary distribution. To determine the memory kernel 
(ro and ri(i)) the mean potential is fit with 20 piecewise 
quadratic functions. Using the functional form of the 
mean force obtained via the quadratic fits and Eqn. [18] 
the memory kernel is determined for At = 50. We would 
like to point out that a large value of the penalizing 
factor in Tikhonov regularization results in inaccurate 
estimate of the Ti(t) while a small value accumulates 
error with time. Hence four different values of b are 
used for different intervals viz. b = 0.01,0.03,0.1,3 for 
t e (0,2000), (2 x 10 3 ,4 x 10 3 ), (4 x 10 3 ,5 x 10 4 ), and 



beyond 5 x 10 4 mins, respectively. Although b = 3 is 
a large penalization, the general signature i.e. a slow 
relaxation of the long tail, t > 5 x 10 4 mins, of the 
memory kernel remains intact. For the spectral density 
calculations, M, the number of time steps over which 
the pseudo-random forces repeat is chosen to be 1000 
for the GLE reconstruction with short memory kernel 
and 2 x 10 6 for the long one, corresponding to the 



trajectories shown in blue and red curves respectively in 
Fig[5p, with time step At = 50. 

Since these simulations are computationally expen- 
sive, the codes are parallelized using mpirun in athena 
cluster at Virginia Tech advanced research computing 
facility on 8 cpu nodes with 16 processors per node. 
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FIG. 1: (Color Online) Simulations with the end-product inhibition motif. (A) The wiring diagram, xi is selected as the 
primary degree of freedom. (B) The potential of mean force obtained from the stationary state distribution, W = — ln(p ss ). 
(C) The memory kernel, Ti obtained using Eqn. [52] with the Tikhonov regularization penalization factor b — 0.1. (D) The 
relaxation dynamics of the species xi obtained using the GLE compared to the full model simulation results, for initial value 
x\ — 2.0. The trajectory for the GLE is averaged over 50000 realizations, and that for the full model is averaged over 300000 
realizations. 
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FIG. 2: Comparison of the Full model vs. GLE results of the x\ relaxation dynamics for different starting points, a;i(0) = 1.3(A), 
0.87(B) and 0.54(C). The full model simulations are performed over 10 5 realizations. The same memory kernel as Fig. [TJis 
used for the GLE simulations. Each GLE result is performed over 10 4 realizations. 
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FIG. 3: (Color Online) Model studies of the ER-GFR signaling network. (A) The wiring diagram adapted from |15| . Each solid 
line starts with the reactant(s), and ends with the product(s). For a reaction with multiple reactants, each reactant is indicated 
with a filled solid circle. Each dashed line represents the influence, with a point arrow end for activation, and a blunt end for 
inhibition. The symbol <f> means degradation. (B) The potential obtained from the stationary state distribution, W ~ — ln(p ss ). 
(C) The memory kernel, Fi obtained using the full model simulation and smoothened using Tikhonov regularization with the 
penalization factor, 6 = 0.01 (black) - early behavior, 0.03 (red) - intermediate behavior and 0.1 (blue) - late behavior, and 
To — 1.0 x 10 6 . (D) Comparison of CLE with the full model simulation using the time evolution of x = Agfr = logio [GFR] . 
The initial value of x = 1.3. The black curve is the full model simulation averaged over 10 6 realizations. The red and the blue 
curves are generated using the CLE with memory kernel for 10 8 and 5 x 10 4 time steps, respectively, each averaged over 10 3 
realizations. The better agreement of the red curve justifies the importance of the long tail of the memory kernel. 
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FIG. 4: (Color Online) Distributions (P(r)) of the first passage time (t) from the shallow well (near Agfr — x — 1) to the 
deep well (near Agfr = x = 0) using full model and the generalized langevin equation for the ER- GFR switch. Note that the 
x-axis is the natural log of the first passage time. 



